1 Compare the flipper length of penguins using statistical tests
First, create subsets of the flipper length of all species as vectors:
library(tidyverse)adelie <-filter(penguins, species =="Adelie")$flipper_len
Warning: Unknown or uninitialised column: `flipper_len`.
gentoo <-filter(penguins, species =="Gentoo")$flipper_len
Warning: Unknown or uninitialised column: `flipper_len`.
chinstrap <-filter(penguins, species =="Chinstrap")$flipper_len
Warning: Unknown or uninitialised column: `flipper_len`.
1.1 Step 1: Test all groups for normality
With Shapiro-Wilk
The Shapiro-Wilk test suggests that Adelie and Chinstrap are normally distributed, but Gentoo is not.
shapiro.test(adelie)
Shapiro-Wilk normality test
data: adelie
W = 0.99339, p-value = 0.72
shapiro.test(gentoo)
Shapiro-Wilk normality test
data: gentoo
W = 0.96219, p-value = 0.00162
shapiro.test(chinstrap)
Shapiro-Wilk normality test
data: chinstrap
W = 0.98891, p-value = 0.8106
With QQ-plots
library(patchwork) # using patchwork to combine multiple plots into onech <- ggpubr::ggqqplot(chinstrap) +labs(title ="chinstrap")ad <- ggpubr::ggqqplot(adelie) +labs(title ="adelie")ge <- ggpubr::ggqqplot(gentoo) +labs(title ="gentoo")ch + ge + ad
1.2 Compare Chinstrap vs. Adelie
Both are normally distributed, so we can compare variances for Chinstrap and Adelie penguins:
var.test(chinstrap, adelie)
F test to compare two variances
data: chinstrap and adelie
F = 1.1894, num df = 67, denom df = 150, p-value = 0.3854
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.8022092 1.8217141
sample estimates:
ratio of variances
1.189396
This indicates that the variances do not differ between Chinstrap and Adelie penguins. Therefore, we can use a t-test to compare mean flipper length:
t.test(chinstrap, adelie)
Welch Two Sample t-test
data: chinstrap and adelie
t = 5.7804, df = 119.68, p-value = 6.049e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.859244 7.880530
sample estimates:
mean of x mean of y
195.8235 189.9536
The t-test suggests, that the mean flipper length differs between Chinstrap and Adelie penguins.
1.3 Compare Chinstrap vs. Gentoo
Gentoo is not normally distributed, so we can use a Wilcoxon-rank-sum test to compare means:
wilcox.test(chinstrap, gentoo)
Wilcoxon rank sum test with continuity correction
data: chinstrap and gentoo
W = 92, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
The Wilcoxon tests suggests that the flipper lengths differ between Chinstrap and Gentoo penguins.
1.4 Compare Adelie vs. Gentoo
Gentoo is not normally distributed, so we can use a Wilcoxon-rank-sum test to compare means:
wilcox.test(adelie, gentoo)
Wilcoxon rank sum test with continuity correction
data: adelie and gentoo
W = 26, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
The Wilcoxon tests suggests that the flipper lengths differ between Adelie and Gentoo penguins.
1.5 Visual comparison
The boxplot supports the findings of the statistical tests. Notches do not overlap.
penguins |>ggplot(aes(x = species, y = flipper_len)) +geom_boxplot(notch =TRUE)
Error in `geom_boxplot()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! object 'flipper_len' not found
Similarly the standard errors of the means do not overlap between the species.
penguins |>ggplot(aes(x = species, y = flipper_len)) +stat_summary()
Error in `stat_summary()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! object 'flipper_len' not found
To add the significance levels of the tests performed, I used the geom_signif() layer provided by the ggsignif package.
Since I performed two different tests (t-test for the comparison of Chinstrap and Adelie penguins, Wilcoxon test for the other two comparisons), I added two different significance layers that distinguish the two tests.